Clustering Uncertain Graphs

نویسندگان

  • Matteo Ceccarello
  • Carlo Fantozzi
  • Andrea Pietracaprina
  • Geppino Pucci
  • Fabio Vandin
چکیده

An uncertain graph G = (V,E, p : E → (0, 1]) can be viewed as a probability space whose outcomes (referred to as possible worlds) are subgraphs of G where any edge e ∈ E occurs with probability p(e), independently of the other edges. These graphs naturally arise in many application domains where data management systems are required to cope with uncertainty in interrelated data, such as computational biology, social network analysis, network reliability, and privacy enforcement, among the others. For this reason, it is important to devise fundamental querying and mining primitives for uncertain graphs. This paper contributes to this endeavor with the development of novel strategies for clustering uncertain graphs. Specifically, given an uncertain graph G and an integer k, we aim at partitioning its nodes into k clusters, each featuring a distinguished center node, so to maximize the minimum/average connection probability of any node to its cluster’s center, in a random possible world. We assess the NP-hardness of maximizing the minimum connection probability, even in the presence of an oracle for the connection probabilities, and develop efficient approximation algorithms for both problems and some useful variants. Unlike previous works in the literature, our algorithms feature provable approximation guarantees and are capable to keep the granularity of the returned clustering under control. Our theoretical findings are complemented with several experiments that compare our algorithms against some relevant competitors, with respect to both running-time and quality of the returned clusterings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers

In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...

متن کامل

Fast Reliability Search in Uncertain Graphs

Uncertain, or probabilistic, graphs have been increasingly used to represent noisy linked data in many emerging application scenarios, and have recently attracted the attention of the database research community. A fundamental problem on uncertain graphs is reliability, which deals with the probability of nodes being reachable one from another. Existing literature has exclusively focused on rel...

متن کامل

Uncertain Graph Sparsification

Uncertain graphs are prevalent in several applications including communications systems, biological databases and social networks. The ever increasing size of the underlying data renders both graph storage and query processing extremely expensive. Sparsification has often been used to reduce the size of deterministic graphs by maintaining only the important edges. However, adaptation of determi...

متن کامل

Finding Community Base on Web Graph Clustering

Search Pointers organize the main part of the application on the Internet. However, because of Information management hardware, high volume of data and word similarities in different fields the most answers to the user s’ questions aren`t correct. So the web graph clustering and cluster placement in corresponding answers helps user to achieve his or her intended results. Community (web communit...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2017